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Abstract 


We revisit the flatland paradox proposed by IStonel (jl976l l which is 
an example of non-conglomerability. The main novelty in the analysis 
of the paradox is to consider marginal vs conditional models rather 
than proper vs improper priors. We show that in the first model 
a prior distribution should be considered as a probability measure 
whereas, in the second one, a prior distribution should be considered 
in the projective space of measure. This induce two different kinds of 
limiting arguments which are useful to understand the paradox. We 
also show that the choice of a flat prior is not adapted to the struc¬ 
ture of the parameter space and we consider an improper prior based 
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on reference priors with nuisance parameters for which the Bayesian 
analysis matches the intuitive reasoning. 

Keywords: Bayesian inference, flat prior, projective space, improper 
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1 Introduction 


Improper priors are commonly used in Bayesian statistics, especially when 
no prior information is available. However, using improper priors may lead 
to some inconsistencies between an intuitive or classic al approach and a 
Bayesian analysis. The Flatland paradox, introduced bv IStonel fjl976[) . is an 
example of such inconsistency and has been largely commented in the litera¬ 


ture. The main argument involved to explain the paradox is t 


of the fl at prior which leads 


o non- c onglomerabili 


(119841) , iHeath and SudderthI fjl989l ) , IJavnesI (120031) . It is also an example of 


y, see e.g. 


le improperness 


Schervish et ah 


inconsistency of the limit behaviour of sequences of proper priors, such as 
uniform priors with large range. 


The aim of this paper is to propose a new way to analyse paradoxes 
based on the use of improper priors. Rather than considering proper vs 
improper priors, we prefer to consider two different Bayesian paradigms, 
one associated to the marginal model and the other one to the conditional 
model. The way to consider a prior distribution and limiting arguments is 
quite different from one paradigm to the other and can explain the paradox. 
In Section O we recall the statistical problem. In Section [3l we show that 
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the choice of the flat prior is not adapted to the problem in the sense that 
the parameter of interest used in the intuitive reasoning is not the whole 
parameter but a sub-parameter for which the related prior distribution is 
not flat but highly informative. Then, we propose an improper prior that 
makes a distinction between nuisance parameters and parameters of interest 
and which corresponds to the intuitive reasoning. In Section IH we propose 
an analysis of the paradox by limiting arguments based on two different 
paradigms. We replace the flat prior by a sequence of proper priors and we 
examine the limit when the range tends to the whole parameter space. In 
the hrst paradigm, there is no inconsistency whereas for the second one, the 
inconsistency remains even with proper priors, provided that we reconsider 
the interpretation of prior distributions and non-conglomerability. 


2 The Flat land paradox 


We give here a presentation of the model as presented in lStond (1198211 . Con¬ 
sider a tetrahedral die which is tossed a unknown number of times, say N 
which is probably large. The faces of the die are labelled ”a”, ”b”, ”a“^” 
and At each toss, the outcome is recorded subject to the rule that 

if the outcome is the inverse of the previous one, the two outcomes are re¬ 
moved, i.e. they annihilate each other. So, at the end, we get a path, denoted 
by 6 with no consecutive inverse symbols. However, 6 is not observed but 
a supplementary toss is performed and the resulting path, denoted by x is 
registered following the same rule. Observing x, a statistician has to guess 

e. 

Let 0 be the set of such hnite paths. We denote by x~ the path obtained 
from X after removing the last outcome and by the set of the three 
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possible paths obtained from x by adding a symbol without annihilation. 
Denote U {x“} the four possible paths obtained from x. In the 

special case where x is the null path, denoted by 0, Aq = Aq is the set 
of the four possible paths of length 1 and x~ is not dehned. Similarly, we 
dehne 9~, A'^ and Ag. For example, if 9 is the path .. .abaa, then Aj" = 
{... abaaa,... abaab,... abaab~^} and 9~ = ... aba. The likelihood of the 
model is 

1{9]X) =p{x\9) = ^lg<.A, = ^l^^Ae, ( 1 ) 

where p{x\9) = P(X = x\9). Given any non-null 9, the event ’’there is no 
annihilation” (at the last toss) can be written ”x G and we have 


P(”no annihilation” 10) = P(X G A'^ \ 9) 


I if MO, 

1 if 0 = 0. 


( 2 ) 


So, with probability greater or equal to 3/4, the path x will be longer 
than the path 9 for any non-null path 9. Intuitively, a good estimate of 9 is 
9 = x~, the only 9 for which there is no annihil a.tion. 

This statistical model was hrst proposed by iLehmannl fjlQSOll to give an 
example of a best equivariant estimator which is not admissible. Identifying 
0 as the free group generated by a and b, then the minimax equivariant 
estimator for 9 under the 0/1 loss function is any 9 such that 9{x) belongs to 
A^. However, the intuitive estimator 9 = x~ dominate uniformly 9 for the 
associated risk f unction, 
f l^ 


Stone 


propose a Bayesian version of this inferential problem and 


put a flat prior ti{9) oc 1 on 0, which corresponds to the right Haar measure 
of the free group which is known to be associated to the best equivariant 
estimator. The posterior distribution is therefore: 
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7r(6'|x) oc l{9-,x) 7r(6') = 

Given a non null path x, the event ”no annihilation” can be written " 9 = x 
or ”x G A^" with a posterior probability 


’no annihilation” lx) = 


le inc o nsiste ncy between (|2]) and 


after 

Stone 


976 

L_i 

Kadane et al. 

. 1986) 


i 

0 if X = 0. 

), named ’’the F 


( 3 ) 


9761). is an example of non-conglomerability fide Finettil . 


at land paradox’ 


1972 


19861) . We have simultaneously P(”no annihiliation” | 9) > 


for any 9 and P(”no annihiliation” | x) < ^ for any x. More generally a non- 
conglomerability phenomenon occurs for some event, say A, if there exists 
0 < a < 1 such that 


P(74|x) < a, Vx 
and P(y4|6*) > a, 'i9. 


( 4 ) 

( 5 ) 


Of course, if the prior distribution vr were a probability distribution, such 
inconsistency could not occur since 

J F{A\x)p{x) dx = J P(74|6')7r(6') d9, 

where p{x) = Jp{x\9)'n'{9) d9 is the marginal probability distribution of x. 
Note that if tt is improper, then p{x) is no longer a probability distribution. 
Other examples of non-conglomerab ility and their analys is by using hnitely 


additive probability can be found in 


Kadane et al. 


(119861). 


In the following, we give new insights of this phenomenon. We denote by 
i{9) the length of 9, by f'(x) the length of x and by ni the number of paths 
of length i. We have riQ = 1, ni = 4, and = 4 x i > 2. 
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The importanc e of examining the distribution of i{6) has been hrst pointed 
out by iHilll (1980) and will be the key point for one part of the explanation 
of the paradox. 


3 A 2-dimensional re-parameterization approach 


In this section, we show that a flat prior on 6 induces a highly informative 
prior distribution on i{6). This prior does not correspond to the intuitive 
approach that suggests a flat prior on i{0). Therefore, we propose another 
improper prior on 6 that consider i{6) as a parameter of interest and the 
specihc path of a given length as a nuisance parameter. For this prior, the 
paradox disappears. 

To explore the features of an improper prior vr, we dehne risk ratios, or a 
relative weights, of any two hnite events A and B; 


RR(A; B) 


n{By 


It is worth noting that RR(A; B) does not depend on the arbitrary chosen 
scalar factor in the dehnition of tt. The improper prior distribution on £{6) 
derived from the flat prior on 6 is, for £ > 1, 


tt{£{ 0) = i) (X a X 3^ ^ oc 3^, 


( 6 ) 


and therefore, for fc > 2, we can dehned a risk ratio 


RR(£(0) = k + l;i{e) = k-l) 


7i{£{ 9) = k + 1) 
7 i{£{ 6) = k — 1) 


^^+1 = 9_ 

rik-i 


(7) 


So, there is 9 times more ’’chance” that £{6) is equal to fc + 1 rather than 
A; — 1, or, from another point of view, the prior puts a weight 9 times larger 
on A; + 1 than on A; — 1 for £. 
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Let rewrite (j2]) and ([3]) respectively as: 


ODD(”no annihilation” 16*) 


P(”no annihilation” 16^) 
P(” annihilation” l^) 


and 


FjXeAjie) 

¥{x = e-\e) 

( 8 ) 


ODD(”no annihilation” I a:) 


P(”no annihilation” I a:) 
P(” annihilation” |x) 


7i{9 = a: |a:) 1 

Tr{6 e A+\x) 3 


(9) 

Note that P(-) refer to the probability of events that involve both X and 6. 
So, the inconsistency between ([2]) and ([3]) corresponds to the factor 9 between 
(IHD and (El). To see that this factor conies from ([7]), let restate the reasoning 
in term of i. For i{6) > 1, the probability that, for a given 6, there will be 
no annihilation can be written 


p(<(.Y) = m +1) I») = 5 

Since this expression depends on 6 only throngh i{6), we have 

p(nA') = m +1) I m) = ^ 

or, eqnivalently, for k > 1, 

F{e{x) = k + i\ e{e) = k) 
p(£(x) = k-i\ i{e) = k) ~ ■ 

Now, the posterior relative risk of no annihilation vs annihilation is 

7r{i{e) = k- l\^{x) = k) _ F{l{x) = k \ i{9) = k-l) 7r(£(0) = fc - 1) _ 3 _ 1 

~ P(£(a;) = k I i{9) = k + l)^ n{i{9) = k + 1) “ 9 “ 3' 

It can be seen how the prior distribntion involved in this formnla infln- 
ences the posterior distribntion. So, the inconsistency between the intnitive 
and Bayesian solntions comes from the fact that the prior is highly infor¬ 
mative on the parameter of interest i-{9). To show that the paradox is not 
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directly related to the improperness of the prior but to its construction, we 
propose another improper prior, say tt, which is implicitly used is the intuitive 
reasoning and for which the inconsistency disappears. 

We do not know i{6) but we just think that i{6) is probably large and we 
implicitly assume that, as a prior knowledge, the event ”£(0) = k" is almost 
as likely as the event " ^{9) = k — V or that "^{9) = fc + 1” especially for large 
values of k, so we put, as an approximation, a flat prior on i{9). Knowing 
the length i oi 9 and using symmetry arguments, any path of length i has 
the same probability to be drawn, which is equal to for £ > 1. The 

resulting prior on 9 is, for i{9) > 1, 

n{9) oc ^9\i{9)) W{9)) oc ^ ^ oc ^ (10) 


The prior tt can also be obtained by usi n g reference priors with nuisa nce 


parameters as proposed by 

Bernardo 

119791 

Bereer and Bernardo 

(1992 

) or 

Kass and Wasserman 

(1996 

L the parameter 9 can be split into two param- 


eters: 9 = where £ is the length of the path and 1 ] = l,..,n£ is the 

index of the path within the pathes of length £. We can consider i as the 
parameter of interest and p as a nuisance parameter. The reference prior 
on £ is the flat prior and we know exactly the distribution of ri\£ which is a 
uniform distribution over {1,2, ...,n£). The resulting prior is therefore tt. 

For £{x) > 2, the posterior distribution is 

and therefore 


P(”no annihilation 


\x) 7 i{9 = X 

x) 

P(” annihilation” 

\x 

) ^(0 e At 

\x) 3 X 3h^)-i 


which matches (IH]) or equivalently P(”no annihilation” |a:) = 3/4 which matches 
dlj). So, the intuitive reasoning is in fact a Bayesian reasoning using the im- 


























proper distribution (fTUD for which the paradox disappears. We may note that 
a simi lar effect is seen in the marginalization paradox by 


Stone and Dawid 


fjl972l. Example 1) where the paradox disappears when using another im¬ 
proper prior. 


Remark'. 


JavnesI (120031 p. 453) tried to explain the paradox by exploring the 


link between the priors on N, the number of tosses and the priors on 
We think that this leads to unnecessary complications since the problem can 
be restated as follows : from the second toss, instead of drawing at random 
one of the four faces of the tetrahedral die, we draw at random with equal 
probability one of the three letters that is not the inverse of that appearing 
in the previous outcome. Only at the supplementary toss that generate x, 
one letter among the four possible ones is drawn at random, which may lead 
to a possible annihilation. In that case, ^{9) = N and the paradox, i. e. the 
inconsistency between ([2]) and ([3]) remains the same. 


4 An approach by a limiting argument 

In this section, we propose another approach of the problem by considering 
limits of proper priors. Let replace the flat prior vr by ttm, the uniform prior 
on the paths of lengths lower or equal to M. When M is large, it is commonly 
admitted that ttm is an approximation of the flat prior. Since the number of 
paths with lengths between 0 and M is equal to 2 x 3^ — 1, 71m{9) is defined 
by 

'^ m { 9 ) = 2 X 3M _ (12) 

Equivalently, tim can be seen as a result of a two steps random procedure 
corresponding to the parameterization (£, rj) described in Section |3l 
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- step 1: draw at random a length £, according to the distribntion = 

2xS‘~-i if ^ 7^ 0 and 7rM(0) = 

- step 2: draw at random a path 6 of length i from a nniform prior: 7r(6'|f') = 

ie{ 9 )=e if ^ 7 ^ 0. If £ = 0, 6^ is the nnll path with probability 1. 

To nnderstand the notion of approximation and its implication when M 
goes to + 00 , it is necessary to distinguish between two Bayesian paradigms 
corresponding respectively to a subjective and an objective approach. 

Paradigm 1 (subjective approach): we assume that 9 is drawn at random 
according to ttm- Therefore, 6 can be considered as a random effect with 
known distribution ttm- We may note that an improper prior tt is not relevant 
in this approach since it is not a probability distribution. The relevant model 
for X is the marginal model Pm{x) = ^)'^m{9). Changing M implies 

that the way x is generated also changes, which means that it is irrelevant to 
consider the limit of the posterior distribution with respect to ttm for x hxed. 
Therefore, in the limiting argument, it is essential to consider the behaviour 
of X. 

Since tim is a probability, we have: 

PM(”no annihiliation”) = ''^^t^m{9) P(”no annihilation” | 6) (13) 

e 

= (14) 

where 7rM(0) is negligible for M large. Clearly, we also have 

PM(”no annihiliation”) = Pm (” no annihilation” | x) (15) 

X 

From flTT)) and [151 on average over x, P(”no annihiliation” |a;) is almost 
equal to 3/4, which corresponds to the intuitive reasoning and the standard 
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(16) 


probability rules. Knowing x gives more information: 

1 if l{x) = M or M + 1 , 

PM(”no annihilation” lx) = ■{ i if 1 < £(x) < M — 1 , 

0 if £(x) = 0 . 

By a straightforward calculation, the event Pm = M oi M + 1) ^ | 
and Pm(1 < < M) ~ 5 - So, for the prior ttm, the difference between 

(]^ and ([3]), which was called inconsistency for the flat prior, remains, but 
only when ^{x) < M which occurs with probability ~ 1/3. This can be 
explained intuitively, as in the improper case, by the fact that there is 9 
times more chances that ^{9) = £{x) + 1 rather than i{6) = i[x) — 1 when 
1 < £(x) < M — 1. So, given x, it is more likely that £{6) = £{x) + 1 with 
annihilation than £{6) = £{x) — 1 without annihilation. 

Now, let M go to + 00 . As mentioned above, considering the limit of 
PM(”no annihiliation”|x) or more generally 'Km{-\x) for x hxed is not rele¬ 
vant: if M get larger, the values of £{x) get larger with a high probability. 
For example, it is not possible to replace ttm by tt in flTHl) for the limiting 
expression since it gives -|-oo whereas the limit is 3/4. In Eq. fflTl) . p(x) is 
formally equal to 1 for the f lat prior, and therefore is not de hned as a proba¬ 
bility distribution on x (see iTaraldsen and Lindqvistl . 120101 ) . This illustrates 


the fact that the flat prior cannot be considered as the limit case of inference 
with ttm and that limiting arguments are not valid. So, when M goes to -fcxo, 
we can only say that £{x) goes to -|-cx) in the sense that ¥m{£{X) < k) goes 
to 0 for any hxed k. The most probable case, that is ”1{X) = M or M + V\ 
which corresponds to ”no annihil ation” with co nditional probability 1 , varies 
with the prior, as pointed out bv IStonel (119821) . 


Paradigm 2 (objective approach): we consider that there is some 9 such 
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that X has been generated according to p{x\9). The relevant model for X 
in this paradigm is the conditional model p{x\9) = l{9]x) rather than the 
marginal model Pm{x) in the subjective approach. Here, 9 is considered as 
a hxed parameter rather than a random effect. In that case, ttm is not the 


actual way t o generate 9, but a way 


Bernardo in 


;o make inference on 9. To cite J.M. 


Irony and Singpurwallal fjl997l) . ’’one should not interpret any 


non-subjective prior as a probability distribution”. So, it is not relevant in 
this paradigm to give an interpretation of the marginal distribution Pm{x) 
of X neither of the joint distribution of {X, 9) based on ttm- Therefore, it is 
irrelevant to consider Pjvf(”no annihiliation”) in flT^ which is related to the 
joint distribution. Changing the prior distribution will not change the way to 
generate x, but will only change the posterior distribution and the related in¬ 
ference on 9. It is therefore relevant here to consider the limit of the posterior 
distribution for x hxed. Moreover, for any scalar a > 0, vr and air give the 
same posterior distribution which means that prior distributions are dehned 
up to a scalar factor. This leads to consider prior distributions in a pro j ective 


sp ace of measures and not as pro babilities (see 


Bioche and Drnilhet 


or 


( 120161 ) 


Taraldsen and LindqvistI (120161) 1. 


In the projective space, improper priors appear naturally as limit of 
proper prior seque nces for the corresponding convergence mode, named q- 
vague convergence ( Bioche and Druilhetl. 20161) : a sequence {ttmImgn of dis¬ 
crete priors is said to converge g-vaguely to the discrete prior vr if there 
exists some scalars om such that aMT^M{9) converges to ti{9) for any 9. 
Here, choosing om = 2 x 3^ — 1, it is easy to see that ttm converges to 
the hat prior vr. Therefore, contrary to Paradigm 1, ttm can be seen as an 
approximation of the hat prior. Note that the g-vague convergence of ttm 
implies the convergence for the posterior distribution for x hxed. For exam- 
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pie, from ffTB]) . limM^+oo IPAr(”no annihilation” |a:) = ^ which matches with 
P(”no annihilation” lx) nnder the flat prior. This illnstrates the fact that 
limiting argnments for x hxed are valid in the second paradigm. 

We now jnstify the fact that, in the second paradigm, the improperness 
of the prior is not directly involved in the inconsistency by showing that the 
inconsistency remains with proper priors snfhciently close to the flat prior. 
In practice, a statistician has a vagne idea of the range in which 6 lies and 
instead of choosing a flat prior, he will choose a prior of type ttm- If he thinks 
that the length of 6 is probably not greater than some hnndred of thonsands, 
he will choose M eqnal to some millions in order to be snre to encompass 
the trne valne of 9 with a snfficient faith. He assnmes implicitly that the 
precise choice of M will not have a great inflnence on the resnlts since it 
does not assnme that the prior represents the actnal way to draw 9. This is 
the case here, since provided that ^{9) < M — 1, the posterior distribntion 
t^m{9\x) does not depend on M. So, if the statistician is almost certain that 
l{9) < M — 1, then he is also almost certain that l{x) < M. So, for those x, 
the inconsistency between the intnitive and the Bayesian answers remains. 
The non-conglomerability is also achieved by a proper prior if we change the 
condition ”Vx” in (j4]) by the condition ’’for any expected x”. We see again 
than the inconsistency is dne to the inappropriate choice of the flat prior as 
prior knowledge, as explained in Section |3] rather than its improperness. 

5 Discussion 

The Flatland paradox is a striking example where a flat prior or a right Haar 
measnre on a discrete parameter cannot be considered as a non-informative 
prior or as a reflect of ignorance. This is mainly dne to the strnctnre of the 
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parameter space. As described by 


Abbott 


(11884) . changing the dimension of 


a space may change the shape of its snbsets and monntains may appear flat. 
This is the case here: the parameter space 0 can be dehned in one dimension 
0 = {9n,n G N} or in two dimensions 0 = ; I < rj < ng, i G M}. A 

flat prior on 6 gives an exponentially increasing prior for i. The fact that a 
flat prior on i gives a satisfactory answer shows that there does not exist an 
antomated way to choose a prior when no information is available or when 
we want to ignore the partial information we have. 

The Flatland paradox, as many other paradoxes or inconsistencies that 
arise in Bayesian inference with improper priors, snggests that there is a gap 
between proper and improper priors. On the other side, improper priors are 
often considered as limits of proper priors. Rather than considering proper 
vs improper priors, we prefer to make a distinction between two different 
Bayesian paradigms. Each paradigm has its own rnles and mixing the rnles 
from one paradigm to the other one generates paradoxes. This is the case for 
limiting argnments that are qnite different from one paradigm to the other 
one. 

In the hrst paradigm, 6 is considered as a random effect and the prior 
distribntion should be considered as a way to draw the parameter. The 
prior distribution must be a probability distribution and improper priors 
should be excluded. The relevant model for the data is the marginal model 
and non-conglomerability phenomenon cannot occur according to standard 
probability rules. Limiting arguments with respect to the prior distribution 
should include the fact that the marginal distribution of x also changes and 
improper priors do not appear as limits of proper priors as we have shown. 

In the second paradigm, 6 is an unknown parameter and the relevant 
model for x is the conditional model. Prior distributions should be consid- 
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ered in a projective space of measures, i.e. defined up to a scalar factor, 
rather than a probability distributions. There is no reason in this paradigm 
to e xclude improper priors wh ich appear naturally as limits of proper pri¬ 


ors (iBioche and Druilhetl . 120161 . Theorem 2.6) independently of the statistical 


model. The rules associated to projective spaces are quite different to that as¬ 
sociated to probability distributions and non-conglomerability phenomenon 
may occur. Limiting arguments should be considered with x hxed. The con¬ 
vergence mode associated to the projective space is th e q-vague convergence 


and can explain for example, the Lindley paradox flBioche and Drnilhet 


20 ld). 


A general consequence of this approach is that inconsistencies in the limit 
may arise in equations involving the joint or the predictive distributions when 
we replace a sequence of proper priors 7r„ by its limit tt in the sense of the 
q-vague convergence. Indeed, joint or predictive distributions are associated 
to the hrst paradigm whereas the q-vague convergence is associated to the 
second one. This is the case for the non-conglomerability example analysed 
in this paper as e.g. in Eq. flTHll . However, the limit involving the posterior 
distribution with x hxed is consistent as in Eq. ffT6l) . 

This approach suggests a general method to analyse inconsistencies or 
paradoxes. If the reasoning involves the joint or predictive distribution, then, 
only proper priors that rehect a subjective knowledge should be considered 
and the relevant model is the marginal model. Improper priors are allowed 
only in an objective approach, where the relevant model is the conditional 
model and where improper priors can be considered as limits of proper priors. 
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